A Complexity 0(1) Priority Queue for Event Driven Molecular Dynamics Simulations 
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We propose and implement a priority queue suitable for use in event driven molecular dynamics 
simulations. All operations on the queue take on average 0(1) time per collision. In comparison, 
previously studied queues for event driven molecular dynamics simulations require 0(log A'^) time 
per collision for systems of A'^ particles. 



I. INTRODUCTION 

Molecular dynamics simulations are a powerful tool for 
determining the behavior of multiparticle systems and 
are used in a wide range of applications 0, S l3ll 

El El 

There are two basic approaches to these simulations: 

(i) Time driven simulations in which equations of 
motion of all particles are solved for a series of small 
time slices. The positions and velocities of the particles 
are determined at the end of each time slice and used as 
input to the calculation for the next time slice. 

(ii) Event driven simulations 0,|l,EiEil23 which are 
applicable to systems of hard spheres or more generally to 
systems with interparticle potentials which are piecewise 
constant. The approach with event driven simulations is 
to determine when the next collision between two par- 
ticles occurs, determine the positions and velocities of 
these particles after the collision and then repeat this 
process. A collision is defined as the event in which the 
hard spheres collide or more generally when two particles 
reach a discontinuity in their interparticle potential. 

We focus here on event driven simulations which, 
where applicable, provide exact results and typically run 
faster than time driven simulations. Determination of 
the next event is usually composed of two steps : 

(i) determination of the collision event with the short- 
est time for each particle. By dividing the system into 
cells and/or maintaining lists of particles within a cer- 
tain distance of a given particle (neighbor lists), the time 
taken for calculation of the first collision event for a given 
particle can be made independent of N, the total number 
of particles in the system |0, |0| . 

(ii) determination of the collision event with the short- 
est time among all the particles, given the events with 
the shortest time for each particle obtained in (i). Ap- 
proaches have been proposed and implemented which al- 
low this determination in 0(log A^) time. 

The subject of this paper is an approach to determining 
the next collision event among all particles. This has 
been a heavily researched subject |23, IH, [s^, 13- 
The requirements for a queue to allow this determination 
is as follows. The queue must support: 

(i) addition of an event to the queue; 
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(ii) identification and deletion from the queue of the 
event with the shortest collision time; 

(iii) deletion of a given event from the queue (e.g. when 
a collision (p,q) occurs we may want to remove the event 
(q,p) from the queue.) These requirements define ab- 
stractly the concept of a priority queue. 

Implementations of priority queues for molecular dy- 
namics simulations have for the most part been based on 
various types of binary trees. They all share the property 
that determining the event in the queue with the smallest 
value requires O(logAf) time "^^l. 

The early work on priority queues is reviewed in 
Ref. . The earliest implementations of priority queues 
used linked lists which results in 0{N) performance. 
Implementations with 0(iV'^'^)_performance were intro- 
duced and analyzed in Refs. [3 El El Ell. The old- 
est priority queue implementations with 0(log N) per- 
formance used implicit heaps binary trees in which each 
item always has a priority higher than its children and the 
tree is embedded in an array HilEll. Other O(logiV) 
im plem entations include leftist trees l24l. binomial queues 
[1 13 9|. pag odas [13 . , skew heavs |35l l3^ , splay trees 
[35II361 1371 and pairing heaps 0| . 

Marin et al. |27l |28j introduced a version of the com- 
plete binary tree which also has OilogN) performance 
and compared it to earlier priority queue implementa- 
tions explicitly in the context of molecular dynamics sim- 
ulations. They find that over a wide range of densities 
their complete binary tree variant has the best perfor- 
mance in terms of the coefficient of the log N term and 
large N behavior. 

In this work, we propose a priority queue for use in 
event driven molecular dynamics simulations for which 
all operations require 0(1) time. The approach is in- 
spired by the concept of a bounded priority queue which 
is typically implemented as an array of linear lists and 
which is applicable to problems in which the values asso- 
ciated with queue items are integers and are bounded (i.e. 
the values t associated with events obey a < t < b where 
a and b are constants). Bounded priority queues are not 
directly applicable to the molecular dynamics queueing 
problem because neither of these requirements are met. 

We show, however, that with a hybrid approach that 
employs both a normal priority queue and a bounded 
priority queue we can ensure all operations on the queue 
take 0(1) time. We make use of the facts that for molec- 
ular dynamics simulations: 

(i) The time associated with an event to be added to 



2 



the queue is always later than the time associated with 
the last event removed from the top of the queue. That 
is, 

t - tlast > 0, (1) 

where t is the time associated with the event to be added 
to the queue and tust is the last event removed from the 
top of the queue. 

(ii) There exists a constant, Ai^ax such that 

t — tlast < At 

max- (2) 

We call a priority queue which supports such events a 
BIPQ (Bounded Increasing Priority Queue). 

II. APPROACH 

The basic idea is to: 

(i) perform a gross sort of the events using an array of 
linear lists and 

(ii) to use a binary tree to perform a fine sort of only 
those events which are currently candidates for the event 
with the shortest time. 

More specifically, our priority queue is composed of the 
following components: 

1. An array, A, of n linear lists li, < i < n. (Section 
llVl below discusses how to determine the the size n of the 
array.) The array is treated in a circular manner. That 
is, the last linear list in the array is followed logically by 
the first linear list. We implement each linear list as a 
doubly linked list. 

2. A binary tree which is used to implement a conven- 
tional priority queue. 

We also maintain two additional quantities: the cur- 
rent index, i*, and io a base index associated with the 
queue. Initially, all linear lists and the binary tree are 
empty and i* and zq are 0. 

III. QUEUE OPERATION 

Here we describe how operations on the queue are im- 
plemented using the data structures described above. 

(i) Addition. Events are added to either one of the 
linear lists or to the binary tree as follows: An index i 
for the event to be added is determined by 

i = [s*t-io\, (3) 

where: t is the time associated with the event; io is the 
base index; and s is a scale factor the value of which is 
such the binary tree never contains more than a relatively 
small number of events ( « 10 — 20). If i is equal to the 
current index , i*, the event is added to the binary tree, 
otherwise it is added to linear list k. 

(ii) Identification of the event with shortest time. The 
event with the shortest time is simply the root of the 



binary tree, as is the case with a normal priority queue 
implemented using a binary tree. If a request is made for 
the event with the smallest time and the binary tree is 
empty, the current index is incremented by one (wrapping 
around to i* = if we reach the end of the array) and 
all events in the linear list Z^-are inserted in the binary 
tree. If there are none, we continue to increment i* until 
a non-empty linear list is found. If we wrap around to 
the beginning of the list, io is incremented by n. We find 
that in practice, when the binary tree becomes empty 
the next linear list is always non-empty (see Section llVI 
in which we show the distribution of event times). 

(iii) Deletion of an event. We simply delete the event 
from the array of linear lists or from the binary tree de- 
pending on the structure in which it is located. 

The fact that the time associated with an event to be 
added to the queue is always greater than or equal to the 
time associated with the last event removed allows us to 
use the array of linear lists in a circular fashion. 

The requirement that there exists a constant, At„iax 
such that 

t — tlast > Atmax (4) 

allows us to use a finite number of linear lists. The num- 
ber of linear lists required is proportional to Atmax- In 
practice wc find that we can always find a reasonable 
value of Atinax such that Eq. Q holds. If a rare event 
occurs which violates this constraint or we want to use 
less memory for linear lists causing the constraint to be 
violated, the event is handled on an exception basis as 
implemented in the processOverflowList function in code 
contained in the Appendix. Alternatively, the applica- 
tion which calls the priority queue code can guarantee 
that such an event never occurs by creating an earlier 
fictitious collision with a time which does not violate the 
constraint. 

Thus all of the events, except for those deleted be- 
fore they are placed in the binary tree, will eventually be 
added to the binary tree, but at any given time the tree, 
instead of containing 0{N) entries, will contain only a 
relatively small number of entries. The number of events 
maintained in the binary tree is only a fraction of the to- 
tal number of particles N in the system and can be made 
independent of N . 

Our priority queue is similar to a calendar queue j^; 
however, the calender queue does not employ a binary 
tree - events are sorted in each of the linear lists. 



IV. HOW TO CHOOSE PARAMETERS 

Two parameters, n the number of linear lists and s the 
scale factor, must be chosen to specify the implementa- 
tion of the queue. Operationally, they can be chosen as 
follows: 

(i) First, by instrumenting the queue to count the num- 
ber of events in the binary tree, determine a value of s 
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such that the number of events in the binary tree is rel- 
atively small (w 10 - 20). Table I. and Fig. EI a) sum- 
marize the values of s we have used for our simulations. 
The figure is consistent with a scale factor linear in N 
with a different coefficient of linearity dependent on den- 
sity. Because we use a binary tree to store events with 
the soonest times, the performance of the algorithm is 
somewhat insensitive to the choice of s. For example, a 
choice of s which results in a doubling of the number of 
events in the binary tree results in only one additional 
level in the tree. 



(ii) Instrument the queue to find Atmax, the maximum 
difference between the time associated with an event to 
be added and the time associated with the last event 
removed and set 

s* Aimax (5) 

to ensure that (Eq. I^J is met. Table I. and Fig. Ofb) 
summarize the number of linear lists n we have used for 
our simulations. As with s, n is linear in N with a differ- 
ent coefficient of linearity dependent on density. We note 
that while memory requirements are 0{N) as in the con- 
ventional implementation of priority queues, the hybrid 
implementation does require significantly more memory 
than the conventional implementation due to the mem- 
ory required for the linear lists. Tradefoffs can be made 
of cpu time for memory by increasing the scale factor 
and/or reducing the number of linear lists (resulting in 
more exception conditions). 



V. COMPLEXITY ANALYSIS 



The basic operations involved in the queue are: 

(i) insertion into and deletion from the linear lists. Use 
of doubly linked lists allows these operations to be im- 
plemented to take 0(1) time. 

(ii) binary tree operations. We use the code of Ref. 

to implement the binary tree operations. When a leaf 
representing an item in the priority queue is added to 
or deleted from the tree, the tree must be traversed from 
the affected leaf possibly all the way to the root node and 
adjustments made to reflect the presence or absence of 
the affected leaf. Thus a bound on the number of levels 
which must be traversed is log2 m where m is the number 
of items in the priority queue. In Sec. llVl we show that 
by choosing the scale factor s appropriately, m can be 
made to be independent of N (and have a relatively small 
value, « 10 — 20). Thus binary tree operations will be 
0(1). _ 

(iii) identification of the next non-empty linear list, 
after the current linear list is exhausted. As explained 
in item (ii) of Sec. IIIII when the binary tree is empty, 
we search forward through the array of linear lists until 
a non-empty list is found. If the number of lists we must 
search through increases with N, this process will not be 
0(1). We show below that with the proper choice of s, 
the number of lists we must search does not grow with N 
and in fact show that the next linear list after the current 
one almost always is non-empty. Thus the complexity of 
identification of the next non-empty list will be 0(1). 

Thus the overall time taken by queue operations per 
collision is 0(1). 



Figure ^a) plots (toj) the average number of events 
with index i versus i for various N. Here 

i = (i ~ i* + ri) mod n. (6) 

That is, i is the distance of i from the current index tak- 
ing into account the circular nature of the array of linear 
lists. The data was obtained by sampling the queue many 
times at regular intervals. With the choice of scale factors 
shown in Table I we achieve our goal of having w 10 — 20 
events with index i = i* and thus in the binary tree. 
Note that to achieve this, the scale factor increases with 
increasing N resulting in the cutoff of the distributions 
also increasing with increasing N. (In fact, if the x-axis 
is transformed by 2: = x/N, the plots collapse as shown 
in Fig. n^b) reflecting the fact that the probability dis- 
tribution of collision times is independent of iV.) Thus, 
the number of linear lists required to ensure that Eq. © 
holds also increases with N. In Fig. [3 we plot the distri- 
bution P{m*) the probability that the number of events 
with the current index, i* is m* versus m* . The distri- 
butions are strongly peaked indicating that the number 
of events in the binary tree do not vary much from the 
average. 



VI. EXPERIMENTS /SIMULATIONS 

We run simulations using both a conventional priority 
queue and our new hybrid approach. For simplicity the 
simulation was of identical size hard spheres of radius 
one and unit mass. The sizes L of the cubic systems are 
set to maintain equal densities. The parameters of the 
simulation are as shown in Table I. 

To demonstrate the performance of our approach, we 
run simulations for cubic systems at four volume densities 
p = 0.01, 0.12, 0.40 and 0.70. The first density represents 
a rarefied gas and the last density represents a jammed 
system. The jamming density for hard sphere systems 
is ~ 0.64 [Tll |. For both the conventional priority queue 
and the hybrid queue we used the binary tree code from 
Ref. m. 

Figure 0| shows the time taken for 10^ collisions for 
queue operations with both a conventional priority queue 
and the hybrid queue. As expected, the time for the 
conventional priority queue increases as logA'^ while the 
time for the hybrid queue is essentially constant. 

There is, however, a slight upward trend in the hybrid 
queue results. To determine if this trend is a feature of 
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the algorithm or of the the benchmark environment, we 
proceed as follows. 

We first study the only two places in the hybrid code 
where looping is involved: 

(i) in the updateCBT function of Ref. we loop as 
we traverse the binary tree. If we traverse more levels 
as the N grows, the algorithm will not be 0(1). To ex- 
plore this possibility, we instrument the function to count 
the number of number of levels we traverse in the tree. 
The results are shown in Fig. ^ The number of loop 
iterations is essentially constant, independent of N . 

(ii) in the deleteFirstFromEventQ function, after the 
priority queue for the current linear list is exhausted, 
we loop until we find a non-empty list. If the number 
of lists we must examine before we find the first non- 
empty list grows with the system size, the algorithm will 
not be 0(1). We examine this possibility by counting 
the number of times we encounter an empty list and find 
that on average the probability of encountering an empty 
list does not grow with N and that the probability of 
encountering an empty list is very small: we encounter 
an empty list only 10^* of the times after exhausting the 
priority queue. 

Having ruled out dependence of the number of loop 
iterations on N as the source of the upward trend in 
the execution times, we now consider whether the larger 
memory needed as N increases is the cause of the trend. 
All modern computer processors employ high speed cpu 
cache memory to reduce the average time to access mem- 
ory 0, 113 • In fact, the processor we use in our simu- 
lations, the AMD Opteron, employs a two-level memory 
cache (64 KB level 1 cache, 1 MB level 2 cache) j^]. A 
similar cache structure is used in the Intel Xeon proces- 
sor '2l'|. Because memory caches are finite size, if the 
memory access is random the larger the memory used 
by a program, the lower the probability that data will 
be found in the cache resulting in slower instruction ex- 
ecution. The effect of cache in benchmark runtimes has 
been studied in Ref. [s^. We study the effect of the 
finite size of the cache in our system as follows: Instead 
of running the molecular dynamics simulations, we run 
a small test program which randomly accesses the data 
structures used by the molecular dynamics simulations. 
For each value of N, the test program executes exactly 
the same number of instructions but uses data structures 
of the size used by the molecular dynamics simulations 
for that value of N. The results are shown in Fig. 0] and 
show an upward trend similar to that of the simulation 
results for all of the densities studied. 

The above results thus suggest that the complexity of 
the hybrid algorithm is, in fact, 0(1) and that the upward 
trend in the results is due to the finite size of the high 
speed memory cache. 



VII. DISCUSSION AND SUMMARY 

We have defined a new abstract data type, the 
Bounded Increasing Priority Queue (BIPQ) having the 
same operations as a conventional priority queue but 
which takes advantage of the fact that the value asso- 
ciated with an item to be added to the queue has the 
properties that: (i) the value is greater than or equal 
to the value associated with the last item removed from 
the top of the queue and (ii) the value minus the value 
of the last item removed from the top of the queue is 
bounded. These properties are obeyed for events in event 
driven molecular dynamic simulations. We implement a 
BIPQ using a hybrid approach incorporating a conven- 
tional priority queue (implemented with a binary tree) 
and a bounded priority queue. All operations on the 
BIPQ take an average 0(1) time per collision. This type 
of queue should provide performance speedups for molec- 
ular dynamics simulations in which the event queue is the 
bottleneck. 
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APPENDIX 

The following code implements the hybrid queue pro- 
posed here. The calls to Insert and Delete are to the 
functions contained in Ref. 28j, which update NP and 
the complete binary tree, CBT. Any code providing the 
same functions could be substituted for Insert and Delete. 



#define nlists 50000 
#define scale 50 

typedef struct 
{ 

int next ; 
int previous; 
int p; 
int q; 
int c ; 
double t; 

unsigned int qMarker; 
int qlndex; 
statusType status; 
}eventQEntry ; 

eventQEntry * eventQEntries ; 
double baselndex; 

int * CBT; /* complete binary tree 

implemented in an array of 
2*N integers */ 

int NP; /*current number of particles*/ 

int linearLists [nlists+1] ; for overflow*/ 
int currentlndex ; 

// 

int insertlnEventQ (int p) 
{ 

int i.oldFirst; 
eventQEntry * pt ; 

pt=eventQEntries+p; /* use pth entry */ 

i=(int) (scale*pt->t-baselndex) ; 

if (i>(nlists-l) ) /* account for wrap */ 

{ 

i-=nlists ; 

if (i>=currentlndex-l) 
{ 

i=nlists; /* store in overflow list */ 

} 

} 

pt->qlndex=i ; 

if (i==currentlndex) 
{ 

Insert (p); /* insert in PQ */ 



> 

else 
{ 

/* insert in linked list */ 

oldFirst=linearLists [i] ; 
pt->previous=-l ; 
pt->next=oldFirst ; 
linearLists [i] =p; 

if (oldFirst !=-l) 

eventQEntries [oldFirst] .previous=p; 

} 

return p; 

// 

processOverf lowList () 
{ 

int i,e,eNext; 

i=nlists; /* overflow list */ 
e=linearLists [i] ; 

linearLists [i] =-1 ; /* mark empty; we will 

treat all entries and may re-add some */ 
while(e!=-l) 
i 

eNext=eventQEntries [e] .next ; /* save next */ 
insertlnEventQ (e) ; /* try add to regular list now */ 
e=eNext ; 

> 



// 

void deleteFromEventQ(int e) 
{ 

int prev,next,i; 

eventQEntry * pt=eventQEntries+e ; 

i=pt->qlndex; 

if (i==currentlndex) 

Delete (e); /* delete from pq */ 
else 
{ 

/* remove from linked list */ 

prev=pt->previous ; 
next=pt->next ; 
if (prev==-l) 

linearLists [i] =pt->next ; 
else 

eventQEntries [prev] .next=next; 

if (next !=-l) 

eventQEntries [next] .previous=prev; 

> 

} 



6 



11- 



int deleteFirstFromEventQO 
{ 

int e; 

while (NP==0)/*if priority queue exhausted*/ 
{ 

/ * change current index */ 

current Index++ ; 

if (currentlndex==nlists) 

{ 

current lndex=0 ; 
baselndex+=nlists ; 
processOverf lowListO ; 

} 



e=linearLists [currentlndex] ; 

while (e!=-l) 

{ 

Insert (e) ; 

e=eventQEntries [e] .next; 

} 

linearLists [currentlndex] =-1 ; 



e=CBT[l] ; /* root contains shortest 
time entry */ 



Delete(CBT[l]) ; 
return e; 



} 

II- 



I* populate pq */ 
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TABLE I: Parameters of molecular dynamics simulations. 



Number 


Scale Factor 


Number 


of Particles 




of lists 


N 


s 


n 


p = OM 


1000 


100 


25000 


8000 


700 


200000 


64000 


5000 


2.5 X 10'^ 


512000 


45000 


25 X 10*^ 


p = 0.12 


1000 


50 


50000 


8000 


500 


400000 


64000 


3400 


5 X 10^ 


512000 


25000 
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FIG. 1: (a) For p — 0.12, the average number of events (m^) 
with index i versus i(the distance of i from the current index 
r ) for (from left to right Ar=1000, 8000, and 64000. (b) Same 
as (a) with the x-a:xis scaled by which results in a collapse 
of the plots. 
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FIG. 2: For p = 0.12, P{m*), the probability that the 
number of events in linear list i = is m*, vs. m* for 
N =1000(squares), 8000(triangies), and 64000(disks). 
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FIG. 3: (a) Scale factor, s, vs A'^ for (from bottom to top) 
p = 0.12,0.01,0.4 and 0.7. (b) Number of linear lists, n, vs 
N for (from bottom to top) p = 0.01, 0.12, 0.4 and 0.7. 
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FIG. 4: Processing time for queue operations vs. A'^, the number of particles in the system, (a) Volume density p = 0.01. 
The higher solid line is the processing time for queue operation for a normal priority queue; the lower solid line is for the 
hybrid queuing system introduced here. The dashed line represents the benchmark test timing to execute a fixed number of 
instructions independent of A'' but with memory sizes corresponding to the memory used for the hybrid system. The dotted 
line represents the number of tree levels traversed (xlO~^) in the binary tree for the hybrid system. (b),(c) and (d) Same as 
(a) for p = 0.12,0.4, and 0.7, respectively. 



